Version Control

Monday, May 6

Today we will…

  • Open-Ended Analysis from Week 5
  • Questions about Midterm?
  • Final Project Group Formation
  • New Material
    • git/GitHub
    • Connect GitHub to RStudio
  • PA 6: Merge Conflicts – Collaborating within a GitHub Repo

Open-Ended Analysis from Week 5

Written Description

  • Start with an intro to the data and topic.
  • Intersperse written description and table / plot output!
  • Do not use variable names or R function names in your written text.
  • Breaking up sections with headers can help with organization and flow.
  • Do not print out the data!

Open-Ended Analysis from Week 5

“Table of Summary Statistics”

  • Not asking for inferential statistics!
  • E.g., We can construct a table of the mean amount of protein per manufacturer and shelf.
Code
cereal |> 
  group_by(manuf, shelf) |> 
  summarize(mean_protein = mean(protein)) |> 
  arrange(shelf) |> 
  pivot_wider(id_cols = manuf,
              names_from = shelf,
              values_from = mean_protein,
              names_prefix = "Shelf_")
# A tibble: 7 × 4
# Groups:   manuf [7]
  manuf Shelf_1 Shelf_2 Shelf_3
  <fct>   <dbl>   <dbl>   <dbl>
1 G        3       1.29    2.67
2 K        2.75    2.14    2.92
3 N        2.67    2.5     4   
4 P        1.5     1       3   
5 Q        5       2       2.5 
6 R        2      NA       3   
7 A       NA       4      NA   

Open-Ended Analysis from Week 5

Table Design

  • Think about the number of rows/columns – is it readable?
  • Change row/column names to be understandable.

Plot Design

  • What type of plot will best display the data?
  • What order of elements will best display the comparison you want to make?
  • Think about: colors, order of categories, if a legend is needed, etc.

Questions about Midterm Exam?

Final Project Group Formation

You will be completing a final project in teams of four.

  • Group Formation Survey due Monday, 5/13 at 11:59pm
    • Help me gather information about your preferences and work styles to facilitate team formation.
    • Your team does not all need to be in the same section, but you might find it useful for in-class work time.
  • Group Contracts (5/20)
  • Project Proposal (5/28)
  • Linear Regression (6/3)
  • Final Deliverable (6/12)

What is version control?


A process of tracking changes to a file or set of files over time so that you can recall specific versions later.

git/GitHub Basics

Git vs GitHub

  • A system for version control that manages a collection of files in a structured way.
  • Uses the command line or a GUI.
  • Git is local.

Git vs GitHub

  • A system for version control that manages a collection of files in a structured way.
  • Uses the command line or a GUI.
  • Git is local.

  • A cloud-based service that lets you use git across many computers.
  • Basic services are free, advanced services are paid (like RStudio!).
  • GitHub is remote.

Why Learn GitHub?

  1. GitHub provides a structured way for tracking changes to files over the course of a project.
  • Think Google Docs or Dropbox history, but more structured and powerful!
  1. GitHub makes it easy to have multiple people working on the same files at the same time.

  2. You can host a URL of fun things (like the class text, these slides, a personal website, etc.) with GitHub pages.

Git Repositories

Git is based on repositories.

  • Think of a repository (repo) as a directory (folder) for a single project.
    • This directory will likely contain code, documentation, data, to do lists, etc. associated with the project.
    • You can link a local repo with a remote copy.

  • To create a repository, you can start with your local computer or you can start with the remote copy.

.gitignore

Sometimes there are files that you do not want to track.

  • A .gitignore file specifies the files that git should intentionally ignore.
  • Often these are machine generated files (e.g., /bin, .DS_Store) or files/directories that you do not want to be shared (e.g., solutions/).
  • We want to ignore .Rproj files!

Actions in Git

Cloning a Repo


Create an exact copy of a remote repo on your local machine.

Committing Changes

Tell git you have made changes you want to add to the repo.

  • Also provide a commit message – a short label describing what the changes are and why they exist.

The red line is a change we commit (add) to the repo.

The log of these changes (and the file history) is called your git commit history.

  • You can always go back to old copies!

Commit Tips

  • Use short, but informative commit messages.
  • Commit small blocks of changes – commit every time you accomplish a small task.
    • You’ll have a set of bite-sized changes (with description) to serve as a record of what you’ve done.
    • With frequent commits, its easier to find the issue when you mess up!

Pushing Changes


Update the copy of your repo on GitHub so it has the most recent changes you’ve made on your machine.

Pulling Changes


Update the local copy of your repo (the copy on your computer) with the version on GitHub.

Pushing and Pulling

Workflow

When you have an existing local repo:

  1. Pull the repo (especially if collaborating).
  2. Make some changes locally.
  3. Commit the changes to git.
  4. Pull any changes from the remote repository (again!).
  5. Resolve any merge conflicts.
  6. Push your changes to GitHub.

Merge Conflicts

These occur when git encounters conflicting changes.


Merge Conflicts

  1. Maybe you are working in real time on the same line of code or text as a collaborator.
  2. Maybe you forgot to push your changes the last time you finished working.
  3. Maybe you forgot to pull your changes before you started working this time.

Merge Conflicts

We will work on resolving merge conflicts today!


But when all else fails…

burn your local repo to the ground and clone again.

Tips for Avoiding Merge Conflicts

  • Always pull before you start working and always push after you are done working!
    • If you do this, you will only have problems if two people are making local changes to the same line in the same file at the same time.
  • If you are working with collaborators in real time, pull, commit, and push often.
  • Git commits lines – lines of code, lines of text, etc.
    • Practice good code format – no overly long lines!

Connect GitHub to RStudio

Install + Load R Packages

Work in your console or an Rscript for this.

  1. Install and load the usethis package.
install.packages("usethis")
library(usethis)
  1. Install and load the gitcreds Package.
install.packages("gitcreds")
library(gitcreds)

Configure git

  1. Tell git your email and GitHub username.
use_git_config(user.name = "JaneDoe2", user.email = "jane@example.org")

(Nothing should happen.)

Generate your Personal Access Token

  1. Generate a PAT.
create_github_token()
  • This will open GitHub and ask you to log in.
  • Fill in a Note and an Expiration (AT LEAST 60 days from now).
  • Click Generate Token.

Store your PAT

  1. Copy your PAT.

  1. Run the following code.
gitcreds_set()

When prompted to Enter password or token:, paste your PAT.

Verify your PAT

  1. Let’s verify.
git_sitrep()

PA 6: Merge Conflicts

You will be completing this activity in groups of 4.

IMPORTANT

This activity will only work if you follow the directions in the exact order that I have specified them. Do not work ahead of your group members!

To do…

  • PA 6: Merge Conflicts
    • Due Monday, 5/6 at 11:59pm – TODAY.
  • Midterm Exam
    • Wednesday, 5/8 + 24 hours.
  • Final Project Group Formation Survey
    • Due Monday, 5/13 at 11:59pm

Office Hours

Tuesday from 1:00-2:00pm and Wednesday 10:30-11:30am. None on Thursday or Friday.

Wednesday, May 8 – Midterm Exam

  • Please grab separators from the sides of the room as you enter.

  • I will pass out a hard copy of the exam.

  • Canvas will unlock the .qmd template at the beginning of class.

Wednesday, May 8 – Midterm Exam

Section 1: General Questions

  • Cannot work on Section 2 until submit Section 1.

Section 2: Short Answer

  • Download .qmd template from Canvas.
  • Submit .qmd and .html files on Canvas by the end of class.

Section 3: Open-Ended Analysis

  • Create your own .qmd file.
  • Submit .qmd and .html file 24-hours after the end of class.

To do…

  • Read Chapter 7: Writing Functions
    • Check-in 7.1 due Monday, 5/13 at 10:00am
  • Final Project Group Formation
    • Due Monday, 5/13 at 11:59pm